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ABSTRACT 

An aggregated method of nonparametric estimators based on time-domain and state- 
domain estimators is proposed and studied. To attenuate the curse of dimensionality, 
we propose a factor modeling strategy. We first investigate the asymptotic behavior 
of nonparametric estimators of the volatility matrix in the time domain and in the 
state domain. Asymptotic normality is separately established for nonparametric esti- 
mators in the time domain and state domain. These two estimators are asymptotically 
independent. Hence, they can be combined, through a dynamic weighting scheme, to 
improve the efficiency of volatility matrix estimation. The optimal dynamic weights are 
derived, and it is shown that the aggregated estimator uniformly dominates volatility 
matrix estimators using time-domain or state-domain smoothing alone. A simulation 
study, based on an essentially affine model for the term structure, is conducted, and it 
demonstrates convincingly that the newly proposed procedure outperforms both time- 
and state-domain estimators. Empirical studies further endorse the advantages of our 
aggregated method. 

KEYWORDS: aggregation, nonparametric function estimation, diffusion, volatility 
matrix, factor, local time, affine model. 



Covariance matrices are fundamental for risk management, asset pricing, proprietary 

trading, and portfolio managements. In forecasting a future event such as the volatility 

matrix, two pieces of information are frequently consulted. Based on the recent history, 

one uses a form of local average, such as the moving average, to predict the volatility 

matrix. This approach localizes in time and uses the smoothness of the volatility matrix 

as a function of time. It ignores completely the historical information, which is related 

to the current prediction through a stationarit}Q assumption. On the other hand, 

one can predict a future event by consulting the historical information with similar 

scenarios. This approach basically localizes in the state variable and depends on the 

stationarity assumption. For example, by localizing on a few key financial factors, 

x By "stationarity" we do not mean that the process is strongly stationary, but has some structural 
invariability over time. For example, the conditional moment functions do not vary over time. 
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one can compute the volatility matrix using the historical information. This results 
in a nonparametric estimate of the volatility matrix using state-domain smoothing. 
See, for example, Anderson, Bollerslev and Diebold (2002) for a unified framework of 
interpreting both parametric and nonparametric approaches for volatility measurement. 

The aforementioned two estimators are weakly correlated, as they use data that 
are quite far apart in time. They can be combined to improve the efficiency of the 
volatility matrix estimation. This results in an aggregated estimator of the volatility 
matrix. Three challenges arise in the endeavor: the curse of dimensionality, the choice 
of dynamic weights, and the mathematical complexity. 

Due to the curse of dimensionality, surface smoothing techniques are not very useful 
in practice when there are more than two or three predictor variables. An efficient 
dimensionality reduction process should be imposed in state-domain estimation. An 
introduction to some of these approaches, such as additive modeling, partially linear 
modeling, modeling with interactions, and multiple index models, can be found in Fan 
and Yao (2003). 

In this paper, we propose a factor modeling strategy to reduce the dimensionality 
in the state domain smoothing. Specifically, to estimate the covariance matrix among 
several assets, we first find a few factors that capture the main price dynamics of 
the underlying assets. Regarding the covariance matrix as a smooth function of these 
factors, the covariance matrix can be computed via localizing on the factors. 

Figure 1 here. 

Our approach is particularly appealing for the yields of bonds, as they are often 
highly correlated, which makes the choice of the factors relatively easy. To elucidate our 
idea, consider the weekly data on the yields of treasury bills and bonds with maturities 
1 year, 5 years, and 10 years presented in Figure 1. We choose the 5- year yield process 
as the single factor. Suppose that the current time is January 14, 2000 and the current 
interest rate of the 5-year treasury bond is 6.67%, corresponding to time index t = 1986. 
One may estimate the volatility matrix based on the weighted squared differences in the 
past 104 weeks. This corresponds to time-domain smoothing, using the small vertical 
stretch of data shown in Figure 1(a). On the other hand, one may also estimate the 
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volatility matrix using the historical data with interest rates approximately 6.67%, say, 
6.67% ± .20%. This corresponds to localizing in state domain and is indicated by the 
horizontal bar in Figure 1(a). Figures 1(b) and 1(c) present scatter plots of the yield 
differences X^ yT — X^ y \ for the 1-year bill against the yield differences X t 10yr — X t ^ for 
the 10-year bond, using respectively the data localizing in the time and state domains. 
The associated regression lines of the time- and state-domain data are also presented. 
The scatter plots give two estimates of the conditional correlation and conditional 
variance of the volatility matrix for the week of t = 1986. They are weakly dependent 
as the two scatter plots use data that are quite far apart in time. 

Let ^T,t an d Sst be the estimated volatility matrices based on data localizing 
in the time and state domains, respectively. For example, they can be the sample 
covariance matrices for the data presented in Figures 1(b) and 1(c), respectively for 
t = 1986. To fully utilize these two estimators, we introduce a weight wt and define an 
aggregated estimatoj^l as 



The weight function uj t is always between and 1, and it can be an adaptive random 
process which is observable at time t. Due to the weak dependence between the original 
two estimators, the aggregated estimator is always more efficient than either of the time- 
and state-domain estimators. 

An interesting question is the choice of the dynamic weight uit- Suppose we have a 
portfolio with allocation vector a. Then the aggregation method gives us the following 
estimate of the portfolio variance: 



2 Ledoit and Wolf (2003) introduce a shrinkage estimator by combining the sample covariance es- 
timator with that derived from the CAPM. Their procedure intends to improve estimated covariance 
matrix by pulling the sample covariance towards the estimate based on the CAPM. Their basic as- 
sumption is that the return vectors are i.i.d. across time. This usually holds approximately when the 
data are localized in time. In this sense, their estimator can be regarded as a time-domain estimator. 

3 We prove in Section 4 that £s,t and St,* are asymptotically independent, and thus they are close to 
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minimizing the variance of a S^ja is 

var(a T S Ti ta) 

WopM ~ — n TT^ V" ^ ' 

var(a J Us^a) + var^a-' S-r^a) 

Indeed, our asymptotic result in Section 3 shows that the optimal weight admits a 
simple and explicit form, independent of a. This makes our implementation very easy. 

The above approach is data analytic in the sense that it is always operational. To 
appreciate our idea, we will introduce a mathematical model for the data-generating 
process in Section 1. And then in the following several sections we formally show that 
the aggregated estimator has the desired statistical properties. 



1 Model and Assumptions 

Let W f = (Wl, • • • , W^) T and W = {W t , Tf ; < t < oo} be an m-dimensional 
standard Brownian motion. Consider the following <i-dimensional diffusion process 

dX t = n t dt + tr t dW t , (4) 

where X t = (X^, • • • , Xf) T , [i t is a d x 1 predictable vector process, and a% is a d x m 
predictable matrix process, depending only on Xt- Here, m can be different from d. 
This is a widely used model for asset prices and the yields of bonds. This family of 
models includes famous ones such as multivariate generalizations of both Vasicek (1977) 
and Cox, Ingersoll and Ross (1985). 

Under model the diffusion matrix is = crtcrj. As mentioned before, when 
d > 2, the so-called curse of dimensionality makes implementation hard. To reduce 
the dimensionality, we introduce a scalar factor ft and model the drift and diffusion 
processes as fj, t = fJ>(ft) and a t = f(/t), where fi(-) = {ni(-)}i<i<d is a d x 1 Borel 
measurable vector and er(-) = {ffij(')}i<i<ii,l<i<m is a rf x ra Borel measurable matrix. 
Then model (J1J) becomes 

m 

dX\ = fr(f t )dt + °M)dWi l<i<d. (5) 

3=1 

be independent in finite sample. In the following, by "nearly independent" and "almost uncorrelated" , 
we mean the same. 
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In this model, the diffusion matrix is S(/t) = cr(f t )cr(ft) T . See also Engle, Ng and 
Rothchild (1990) for a similar strategy. 

We introduce some stochastic structure on ft by assuming that ft is the solution to 

the following stochastic differential equation (SDE): 

m 

df t = a(f t )dt + J2bj(ft)dWi, (6) 
i=i 

where a(-) and bi(-), &2( - )> • • • > &m( - ) are unknown functions. In some situations like 
modeling bond yields^, the factor ft can be chosen as one of the bond yields, i.e., ft is 
one of the coordinates of Xt ■ But in general, ft may be different from any coordinate 
of X t , and the theoretical studies in this paper apply to both cases. The data are 
observed at times £j = to + *A, i = 0, 1, • • • ,N, with sampling interval A, resulting in 
vectors {X t i: i = 0, 1, • • • , N} and {ft it i = 0,1, ■ ■ ■ , N}. This model is reasonable for 
the yields of bonds with different maturities since they are highly correlated. Thus, 
localizing on all the yields processes in the state domain results in approximately the 
same data set as localizing on only one of the yields processes. In addition, our study 
can be generalized to the multi-factor case without much extra difficulty. We will focus 
on the one-factor setting for simplicity of presentation. 

Let Yi = (X u+1 - X u )£\~ l/2 , and denote by Y/, Y?, ■■■ , Yf the coordinates of 
Y i. Then, by the Euler scheme, we have 

r i ^(/Ov / A + <r(/ t j£ tj , (7) 

where £t { follows the m-dimensional standard Gaussian distribution. The conditional 
covariance matrix of X at time tj can be approximated by AS(/; ; ) (see Fan and Zhang, 
2003). Hence, the estimate of the conditional covariance matrix is almost equivalent to 
the estimate of the diffusion matrix S(-). Fan and Zhang (2003) study the impact of the 
order of difference on nonparametric estimation. They found that while higher order 
can possibly reduce approximation errors, it increases variances of data substantially. 
They recommended the Euler scheme ([7]) for most practical situations. 

To use time-domain information, it is necessary to assume that the sampling fre- 
quency A converges to zero so that the biases in time-domain approximations are 



4 In practice, one can take the yields process with median term of maturity as the driving factor, as 
this bond is highly correlated to both short-term and long-term bonds. 
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negligible. As a result, we face the challenge of developing asymptotic theory for the 
diffusion model ([5]). Both nonparametric estimators in the time domain and state 
domain need to be investigated. Pioneering efforts on nonparametric estimation of 
drift and diffusion include Jacod (1997), Jiang and Knight (1997), Arfi (1998), Gobet 
(2002), Bandi and Philips (2003), Cai and Hong (2003), Bandi and Moloche (2004), 
and Chen and Gao(2004). Arapis and Gao (2004) investigate the mean aggregated 
square errors of several methods for estimating the drift and diffusion, and compare 
their performances. Ai't-Sahalia and Mykland (2003, 2004) study the effects of random 
and discrete sampling when estimating continuous-time diffusions. Bandi and Nguyen 
(1999) investigate small sample behaviors of nonparametric diffusion parameters. See 
Bandi and Phillips (2002) for a survey of recently introduced techniques for identify- 
ing nonstationary continuous-time processes. As long as the time horizon is long, the 
diffusion matrix can be estimated with low frequency data (say, finite A -1 ). See, for 
example, Hansen et al. (1998) for the spectral method, Kessler and S0rensen (1999) 
for parametric models, and Gobet et al. (2004) for specific univariate nonparametric 
diffusions. 

To facilitate our future presentation, we make the following assumptions: 
Assumption 1. (Global Lipschitz and linear growth conditions). There exists a 
constant ko > such that 

\\fj,(x) - fj,(y)\\ + \\a{x) - cr(y)\\ < k \x - y\, (8) 

Mx)f + \\a(x)\\ 2 <k 2 (l + x 2 ), 
for any x,y 6 R. Also, with b(-) = (&i(-), &2( - )) ' " j b m (-)) T , assume that 

\a(x) - a(y)\ + ||b(x) - b(y)|| < k \x - y\. 

Assumption 2. Given any time point t > 0, there exists a constant L > such that 
E\ui(r s )\^ qo+ ^ < L and ^Icr^- (r- s ) | 4 ^ 0+<5 > < L for any s £ [t - n,t] and 1 < i,j < d, 
where r\ is some positive constant, go is an integer not less than 1, and 5 is some small 
positive constant. 
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Assumption 3. The solution {ft} of model ([6]) is a stationary Markov process and 
real ergodic. For t > 0, define the transition operator by: 

(H t g)(a) =E(g(f t )\f Q = a), a G R, 

where g(-) is any Borel measurable bounded function on R. Suppose Ht satisfy the Gi 
condition of Rosenblatt (1970), i.e., there is some s > such that 
, , E l / 2 (H s g) 2 (X) 

h. 2 = sup F1 / 2 2 ; y \ < « < i- 

{g, Eg(X)=0} E l / 2 g 2 (X) 

Assumption 4- The conditional density pi{y\x) of ft i+l given f t . is continuous in the 
arguments (y, x) and is bounded by a constant independent of I. The time-invariant 
density function p(x) of the process ft is bounded and continuous. 

Assumption 5. The kernel K{-) is a continuously differentiable, symmetric probability 
density function satisfying 



/ 



Mi 

and 



x j K'(j;)|dx < oo, j = 0, l,--- ,5, (9) 
/, TO <oc,, = 0,l,..,4, (10) 



f o = / K 2 {x)dx < oo. 



Let {.Ft} be the augmented filtration defined in Lemma 2 of Appendix. Assumption 
1 ensures that there exist continuous, adapted processes X = {X t , £ Tt\ < t < oo} 
and f = {ft £ Ft', < t < oo}, which are strong solutions to SDEs (HD and ^ 
respectively, provided that the initial values Xq and fo satisfy _E||Xo|| 2 < oo and 
E\fo\ 2 < OO) an d are independent of Brownian motion W (see, e.g., Chapter 5, Theorem 
2.9 of Karatzas and Shreve, 1991). Assumption 2 indicates that, given any time point 
t > 0, there is a time interval [t — n, t] on which the drift and volatility functions 
have finite 4(go + <5)-th moments. Assumption 3 says that ft is stationary and ergodic 
and satisfies some mixing condition (see Fan, 2005), which ensures that ft is Harris 
recurrent. For the stationarity assumption of ft to be true, see Hansen and Scheinkman 
(1995) for conditions. Assumption 4 imposes some constraints on the transition density 
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of ft- Assumption 5 is a regularity condition on the kernel function. For example, the 
commonly used Gaussian kernel satisfies it. 

With the above theoretical framework and assumptions, we will formally demon- 
strate that the nonparametric estimators using the data localizing in time and in state 
are asymptotically jointly normal and independent. This gives a formal theoretical 
justification and serves as the theoretical foundation for the idea that the time-domain 
and state-domain nonparametric estimators can be combined to yield a more efficient 
volatility matrix estimator. 

2 DIFFUSION MATRIX ESTIMATION USING RECENT 
INFORMATION 

The time-domain method has been extensively studied in the literature. See, for ex- 
ample, Robinson (1997), Hardle et al. (2002), Fan, Jiang, Zhang and Zhou (2003), and 
Mercurio and Spokoiny (2004), among others. A popular time-domain method, the 
moving average estimator is defined as 

1 n 

^MA,t = -J2 Y ^ Y f-i^ (11) 

i=l 

where n is the size of the moving window. This estimator ignores the drift component 
and utilizes n local data points. An extension of the moving average estimator is the 
exponential smoothing estimator, which is defined as 

oo 

£ ES ,t = (1 - A) X^Yt-iYti, (12) 
i=i 

where A is a smoothing parameter controlling the size of the local neighborhood. Risk- 
Metrics of J. P. Morgan (1996), which is used for measuring the risks of financial assets, 
recommends A = 0.94 and A = 0.97 when one uses (|12[) to forecast the daily and 
monthly volatility, respectively. 

The exponential smoothing estimator (|12[) is one type of rolling sample variance 
estimator. See Foster and Nelson (1996) for more information about rolling sample 
variance estimators. Estimator (|12p is also related to the multivariate GARCH model 
in the literature. Note that when A is very small, the first term on the right hand side 
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of can be ignored. Thus ([7]) and (fT2j) can be written as 

Yi « <r(f ti )£i, 

E ti = (l-A)Y i _iYf_ 1 + AE ti _ 1 , 

where E^ = a(ft i )a(ft i ) T , which reminisces the IGARCH model. 

The exponential smoothing estimator in (|12j) is a weighted sum of squared returns 
prior to time t. Since the weight decays exponentially, it essentially uses recent data. 
To explicitly account for this, we use a slightly modified version: 

^T, t = ^±X^Y t ^Yl, (13) 

i=l 

Here, as in the case of the moving average estimator in (jlip . n is a smoothing parameter 
controlling explicitly the window width, and A acts like a kernel weight which may 
depend on n. For example, when A = 1 — ^ with r a positive constant, besides the 
normalization factor fera > the first data point it-l receives weight 1, while the last 
point Yt- n receives approximately weight e~ T . In particular, when A = 1, it becomes 
the moving average estimator (jlip . 

Before going into the details, we first introduce some notations and definitions. Let 
A = (a>ij) be an m x n matrix. By vec(^4) we mean the mn x 1 vector formed by 
stacking the columns of A. If A is also symmetric, we vectorize the lower half of A and 
denote the vector by vech(A). These notations are consistent with Bandi and Moloche 
(2004). It is not difficult to verify that there exists a unique m 2 x m(m + l)/2 matrix 
D with elements and 1, such that 

P D vec(A) = vech(^4), 

where Pjj = {D T D)~ 1 D T . Another useful definition is the Kronecker product of two 
matrices A and B, which is defined as A <g> B = (aijB). 

Since the estimator Et^ is symmetric, we only need to consider the asymptotic 
normality of the linear combination of the vector vech(S'r ] t): 

-. n d k 

U T , t = c r vechS T , = ifA £ v- 1 £ £ cuY^YU (14) 

i=l k=l 1=1 

where c = (c^i, C2,i, 02,2, c% t i, • • • , Cd.d) T is a constant vector. 
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Proposition 1 Under Assumptions 1 and 2, for almost every sample path, we have 

\\cr{r s ) -cr(r u )\\ < K\s - u\ q , s,u 6 [t - <q,t], (15) 

where q = (2go — l)/(4go), qo is the integer in Assumption 2, and the coefficient K 
satisfies E[K^ qo+ ^] < oo with 5 a positive constant. 

Remark 1. Proposition 1 shows the continuity of cr(r s ) as a function of time s, which 
is the foundation of time-domain estimation. In the proof of Proposition [H we only 
used Assumption 2 and the condition ||cr(x) — <x(y)|| < ko\x — y\ with ko a positive 
constant. Assumption 1 is made to ensure the existence of a solution to model (0). 

Theorem 1 Suppose that n — > oo, nA 2q ^ 2q+1 ^ — > 0, and Assumptions 1 and 2 hold 
at time t. If the limit r = lim n(l — A) exists, then given ft = x, the conditional 

n— too 

distribution of vech(ST,t) is asymptotically normal, i.e., 

vech(S T , 4 - E(x)) N (o, ^ + ^ A(x)) , 
where A(x) = P]j{Z{x) ® S(x)}P D . 

Note that all data used in the estimator (|13f) is within nA away from time t. Ac- 
cording to Proposition 1, the approximation error of (I13p is at most of order 0((nA) q ), 
which together with the condition nA 2q ^ 2q+1 ^ — > in Theorem Q] guarantees that the 
bias is of order o(n -1 / 2 ). 

3 DIFFUSION MATRIX ESTIMATION USING HISTOR- 
ICAL INFORMATION 

The diffusion matrix in (jj]) can also be regarded as a nonparametric regression given 
ft = x. See for example its first order approximation ([7|). Therefore, it can be estimated 
by using the historical information via localizing on the state variable ft, as illustrated 
in Figure 1. The local linear smoother studied in Stanton (1997) will be employed. This 
technique has several nice properties, such as asymptotic minimax efficiency and design 
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adaptation. Further, it automatically corrects edge effects and facilitates bandwidth 
selection (Fan and Yao, 2003). 

In the construction of the state-domain estimator, we will use the N — 1 data points 
right before the current time t, i.e., the historical data {(/^, Yi), i = 0, 1, ••• , iV — 1}. 

It can be shown that the diffusion matrix has the standard interpretation in terms 
of infinitesimal conditional moments, that is, 

E\YiYZ\f tk =x ]=v ij (x ) + O(A). 

For a given kernel function^ K and a bandwidth h, the local linear estimator 0^ of 
Vij(xo) is obtained by minimizing the objective function 

N-l 

Y^inyi + $ + (h - x )p{ j }K h (f tk - s ) (i6) 

k=0 

over 0q and 0^ . Let 

jv-i 

W i (x) = Y,(ft k -x) £ K h (f tk -x) (17) 

fc=0 

and 

w k (x) = K h {f tk - x){W 2 (x) - (f tk - x)W 1 (x)}/{W (x)W 2 (x) - W^x) 2 }. (18) 
Then the local linear estimator in (|16p can be expressed as 

N-l 

S 5)t (x) = Y, M*)Y k Yl (19) 

k=0 

This estimator depends only on the historical data (horizontal bar in Figure 1), and 
relies on the structure invariability. 

The above weight function Wk{x) is called an "equivalent kernel" in Fan and Yao 
(2003). Expression (I19p reveals that the estimator Jls,t{x) is very much like a conven- 
tional kernel estimator except that the "kernel" Wk(x) depends on the design points 
and locations. 

Before establishing the asymptotic normality of Sg^x), we first investigate the 
asymptotic property of Wg{x). 

5 The kernel function is a probability density, and the bandwidth is its associated scale parameter. 
Both of them are used to localize the linear regression around the given point xq. The commonly used 
kernel functions are the Gaussian density and the Epanechnikov kernel K(x) = 0.75(1 — x 2 ) + . 
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Proposition 2 Suppose A — > 0, iVA — > oo, and %■ \/ A log A 1 = o(l). Under As- 
sumptions 3-5, we have 

Wiix) = Nh e {p(x)^ + o a . s .(l)}, £ = 0,1, 2, 3. (20) 

The results of Proposition [5] are similar to those in Section 6.3.3 of Fan and Yao 
(2003, p. 237), but the proofs are completely different, as we have a highly correlated 
sample {ft t } here. The high correlation makes their proof fail in our case. To attack 
this problem, we invoke the local time. The definition and some preliminary results of 
local time can be found in Revuz and Yor (1999, p. 221). For the multifactor situation, 
the local time generally does not exist. However, by using the occupation time of Bandi 
and Moloche (2004), our results can be generalized to the multifactor situation. 

Theorem 2 Suppose A -> 0, NA -> oo, h = 0(N~ 1 l b ), and I a/A log A" 1 = o(l). 



Moreover, suppose that S(-) is twice differentiable. Under Assumptions 
domain estimator has the following asymptotic normality 



3-5, the state- 



y/Nh vech (a) - E(s) - -h 2 ^"{x)) -=+ M(0,2u oP (xy 1 A(x)), 

where is the matrix whose entries are the second derivatives of the corresponding 

entries of XI (x). 

Proposition [2] and Theorem [2] are both studied under the assumption of high fre- 
quency data over a long time horizon, i.e., A — » and iVA — ► oo. Various studies 
under this assumption include Arfi (1998), Gobet (2002), and Fan and Zhang (2003). 



4 DYNAMIC AGGREGATION OF TIME- AND STATE- 
DOMAIN ESTIMATORS 

In this section, we show that the nonparametric estimators in the time and state do- 
mains are asymptotically independent. This allows us to combine these two estimators 
together to yield a more efficient one. 

6 The stationarity condition of ft in Assumption 3 can be weakened to Harris recurrence. See Bandi 
and Moloche (2004) for asymptotic normality of local constant estimator under recurrence assumption. 
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4.1 Asymptotic Normality 

The time- and state-domain estimators denned in the previous sections are both driven 
by the factor process ft- Intuitively, with high probability, most of the data they use are 
far apart in time. Since the Markov process ft is stationary and satisfies some mixing 
condition (Assumption 3), it is reasonable to expect that the time- and state-domain 
nonparametric estimators are also asymptotically independent. The following theorem 
formally shows this result. 

Theorem 3 Under the conditions of Theorems [7] and\^ conditioning on ft = x, we 
have 

(i) asymptotic independence: 



where £l{x) = {2uj 2 {x)uqp{x) 1 + 6(1 — ut (x)) 2 yr~^ry ) A(x) , provided thatMmNh/n = b 



Note that the nonparametric estimator in the time domain uses n data points and 
the nonparametric estimator in the state domain effectively uses the amount 0{Nh) 
of data. The condition lim Nh/n = b ensures that both estimators effectively use the 
same amount (order) of data, which avoids the trivial case that either the time domain 
or the state domain dominates the performance. 

4.2 Choice of the Dynamic Weight 

A natural question is how to choose the dynamic wight uJt(x). By Theorem 3(i) and 
([3]), it is easy to see that for any allocation vector a, the asymptotic optimal weight is 




(ii) asymptotic normality of the aggregrated estimator S^^a;) in |7]): 




for some positive constant b and h = 0(N 1 ' 5 ). 



u t (x) 



br(l + e T )p(x) 



(21) 



2u {e T - 1) + br(l + e T )p(x) 
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which is independent of a. This choicqj also optimizes the performance of the ag- 
gregated covariance estimator ^A,t(x)- Indeed, by Theorem 3(h), the asymptotic co- 
variance matrix of J^A,t( x ) is given by Cl(x). It depends on the weight through the 
coefficient 

Mx) = 2^(xHp(xy 1 + 6(1 - ^t(x)) 2 T ^ T + _ e ^ , 

which is a quadratic function, and attains its minimum at (|21|) . 

When < b < oo, the effective sample sizes in the time and state domains are com- 
parable. Hence, neither the time-domain nor the-state domain estimator dominates. 
Therefore, by aggregating the time- and state-domain estimators, we obtain an optimal 
reduction of asymptotic variance. The biases of the aggregated estimator are indirectly 
controlled, when the optimal smoothing is conducted for both time- and state-domain 
estimators so that their biases and variances are already traded off before aggregation. 

Note that at time t, the optimal weight tot(x) depends on the current value of 
the factor process / through the density function p{x). This is consistent with our 
common sense. When / is low or high, p(x) and consequently, the optimal weight are 
approximately zero. In this case, the main contribution to the aggregated estimator 
comes from the time-domain estimator. When / is well in middle of its state space, say 
near its unconditional mathematical expectation, the state-domain estimator tends to 
dominate the aggregated estimator. 

In practice, the density function p(x) is unknown and should be estimated. There 
are lots of existing methods to do this, such as the kernel density estimator and the local 
time density estimator (see A'it-Sahalia, 1996; and Dalalyan and Kutoyants, 2003). 



5 NUMERICAL ANALYSIS 

To evaluate the aggregated estimator, we compare it with the time-domain estimator 

and the state-domain estimator. For the time-domain estimation, we apply the expo- 

7 The optimal choice of weight is proportional to the effective number of data points used for the 
state-domain and time-domain smoothing. It always outperforms the choice with u)t = 1 (state-domain 
estimator) or cj t — (time-domain estimator). 
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nential smoothing with A = 0.94. For the state-domain estimation, we choose one 
yield process as the "factor," and then use it to estimate the volatility matrix. The 
Epanechnikov kernel is used with the bandwidth h chosen by generalized cross valida- 
tion method (see Fan and Yao, 2003). To choose the optimal weight u>t(x), we estimate 
the density function p(x) by the kernel density estimator (see A'ft-Sahalia, 1996). 

The following three measures are employed to assess the performance of different 
methods for estimating the diffusion matrix. The first two can only be used in simula- 
tion, and the last one can be used in both simulation and real data analysis. 

Measure 1. The entropy loss is given by 

li(E t) S t ) = tr(E t - 1 E t ) - log iS^St] - dim(S t ). 
Measure 2. The quadratic loss is defined as 

Z 2 (£t, %) =tr(S t -S t ) 2 . 
Measure 3. The prediction error (PE) is computed as 

-, T+m 

PE (^) = - E tr(Y z Yf-%) 2 (22) 
i=T+l 

for an out-sample of size m. The expected value can be decomposed as 

T+m 1 T+m 



E[PE(%)} = - £ E[tv(Y t Yi-V u ) 2 } + - £ £[tr(£ ti - £ ti ) 
m £ — ' m L — ' 

i=T+l i=T+l 



2i 



Note that the second item reflects the effectiveness of the estimated diffusion matrix, 
while the first term is the size of the stochastic error, independent of the estimators. 
The first term is usually an order of magnitude larger than the second term. Thus, 
a small improvement in PE means a substantial improvement in estimated volatility. 
This will also be clearly demonstrated in our simulation study (see Figure 4). 
Measure 4. Adaptive prediction error (APE). 



8 The choice comes from the recommendation of the RiskMetrics of J. P. Morgan. The parameter A 
can also be chosen automatically by data by using the prediction error as in Fan, Jiang, Zhang and 
Zhou (2003). Since we compare the relative performance between the time-domain estimator and the 
aggregated estimator, we opt for this simple choice. The results do not expect to change much when 
a data-driven technique is used. 
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As seen above, the dominant part of the PE is the stochastic error; however, what 
we really care about is the estimation error. To reduce the stochastic error in (|22h . we 
define the following adaptive prediction error: 



where k is a nonnegative integer. The basic idea is to average out the stochastic errors 
first before computing square losses, but this creates bias when k is large. When k = 0, 
the APE reduces to the PE defined in (j22"|) . 

5.1 Simulation 

We use an essentially affine market price of risk specifications in Duffee (2002) to sim- 
ulate bond yields, and hence to obtain simulated multivariate time series. Essentially 
affine model is the multivariate extension of the square-root process. It has been proved 
useful in forecasting future yields (see Duffee, 2002). Cheridito, Filipovic and Kimmel 
(2005) investigate the essentially affine model with one, two, and three state variables, 
and give estimates of the parameters. We use their one state variable model to conduct 
the simulations. 

The one state variable affine term structure model assumes that the instantaneous 
nominal interest rate rt is given by 



where do and d\ are scalars, and St is a scalar state variable. The evolution of the state 
variable St under the the risk-neutral measure Q is assumed to be 



This is the well-known Cox-Ingersoll-Ross (CIR) model. 

Let P(t, t) be the time-i price of a zero-coupon bond maturing at t + r. Under the 
affine term structure and the assumption of no arbitrage, Duffie and Kan (1996) show 
that the bond price admits the form 




(23) 



r t = do + dis t 




(24) 




(25) 
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where A(r) and B{t) are both scalar functions satisfying the following ordinary differ- 
ential equations (ODEs) 

^ = -a Q lB {r) - d and ^ = b^B(r) - \b\t) + dl . (26) 

Thus, the bond's yield 

y(s t ,T) = — log P(t,r) = -[-A(r) + B(r)s t ] (27) 

T T 

is affine in the state variable sj. 

We use the above model to simulate 5 zero-coupon bond yield processes with ma- 
turities 1 month, 2 years, 4 years, 6 years, and 8 years. Since there is only one state 
variable St, the bond yields of different maturities are perfectly linearly related, as 
shown in (|27p . which is an unrealistic artifact of the model. To attenuate this dilemma, 
Cherito et al. (2005) assume that only the 1-month yield process is observed with- 
out error, while other yields are contaminated with i.i.d. multivariate Gaussian errors 
with mean zero and unknown covariance matrix. They estimate the unknown param- 
eters from the yields of zero-coupon bonds extracted from the US Treasury security 
prices from January 1972 to December 2002. The estimated parameters are af = 0.5, 
6*2 = -0.0137, d = 0.0110, and d x = 0.0074. The standard deviations of the Gaus- 
sian errors are estimated as g\ = 0.0119, &i = 0.0144, o 3 = 0.0155, and 04 = 0.0159 
for the yields of 2-, 4-, 6-, and 8-year bonds, respectively. The associated correlation 
coefficients are estimated as p\i = 0.9727, p%3 = 0.9511, pu = 0.9371, P23 = 0.9950, 
p 2 4 = 0.9877, and p 34 = 0.9978. 

Figure 2 here. 

In the simulation, we set the the parameter values to be the above estimated values 
from Cherito et al. (2005). We first generate discrete samples of the state variable St 
from diffusion process (|24p . Then we solve ODEs in (|26p numerically. Figure 2 shows 
the solution to (|26p . After that, we obtain the ideal yield processes by using (I27p with 
maturities 1 month, 2 years, 4 years, 6 years, and 8 years. Finally, we add the i.i.d. 
4-variate normal errors to the last 4 ideal yield processes to obtain the observed bond 
processes with these maturities . 



9 Here we add normal noise to make the model more realistic. Our method performs even better 
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To generate the sample path of St, we use the transition density property of the 
process. That is, given st = x, the variable 2cst+A has a noncentral chi-squared 
distribution with degrees of freedom 4a^ and noncentrality parameter 2cxe b ? lA , where 

2b Q 

c = — , J- 1 - — . The initial value of sn is generated from the invariant distribution of 

cxp(6^ 1 A)-l 

St, which is gamma distribution with density p(y) = Y^y v ~ 1 e~ uiy , where v = 2a^ 1 and 



u = -26^. 



We simulate 500 series of 1200 observations of weekly data with A = 1/52 for the 
yields of five zero-coupon bonds with maturities 1 month, 2 years, 4 years, 6 years, 
and 8 years, respectively. For each simulated series, we set the last 150 observations as 
the out-sample data. For time t out-sample data point, the time-domain estimator is 
based on the past n = 104 (two years observations, i.e., observations from t — 104 to 
t — 1; and the state-domain estimator is based on the 1050 data points right before the 
current time, i.e., the data points from time t — 1050 to t — 1. The first yields process 
(1-month) is used as the factor for state-domain estimation. 

As pointed out in Section 1, the conditional covariance matrix of the multivariate 
diffusion can be approximated by the diffusion matrix times the sampling interval A. 
Hence, we first obtain estimates of the diffusion matrix, and then convert them into 
the conditional covariance matrix estimates. The theoretical value of the conditional 
variance of st is given by Duffee (2002). Since the bond yields are linear regression 
models of the state variable (see (I27p with Gaussian errors), the true (theoretical) 
value of the conditional covariance matrix of the bond yields can be easily obtained. 
By comparing the estimated conditional covariance matrix to its theoretical value, the 
performance of our estimation procedures is evaluated. 

Figure 3 here. 

Figure 3 depicts the averages and standard deviations of the entropy and quadratic 
losses of time-domain, state-domain, and aggregated estimators. It shows unambigu- 



without noise. Since the noise vectors are i.i.d. across time and the standard deviations are small, 

adding them to the original time series does not change the whole structure. Hence, our theory can 

carry through under contamination. 

With A = 0.94, the last data point used in the time domain has an extra weight 0.94 104 ^ 0.0016, 

which is very small. Hence, we essentially include all the effective data points. 
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ously that the aggregated method always has the smallest averages and standard de- 
viations across 500 simulations for both the entropy loss and quadratic loss. Figures 
4(a) and 4(b) summarize the distributions of the average losses over 150 out-samples 
forecasting across the 500 simulations. The results are consistent with those in Figure 
3. On the other hand, if the PE in (|22p with m = 150 is used, the distributions look 
quite different, which is demonstrated in Figure 4(c). It shows clearly that even though 
there are huge efficiency improvements in estimating the volatility matrix by using the 
aggregated method, the improvements are masked by stochastic errors which are an 
order of magnitude larger than the estimation errors. The average prediction errors 
over 500 simulations are 1.850 x 10~ 2 , 1.825 x 10~ 2 , and 1.846 x 10~ 2 for the time- 
domain, the aggregated, and the state-domain estimators, respectively. This demon- 
strates that a small improvement in PE means a huge improvement in the estimation of 
the volatility matrix. This effect is more illuminatingly illustrated in Figure 4(d) where 
each point represents a simulation. The x-axis represents the ratios of the averages of 
150 quadratic losses for the time-domain estimator and the state-domain estimator to 
those for the aggregated estimator, whereas the y-axis is the ratios of the PEs for the 
time-domain estimator and the state-domain estimator to those for the aggregated es- 
timator. The x-coordinates are mostly greater than 1, showing the improved efficiency 
of the aggregated estimation. On the other hand, the improved efficiency is masked by 
stochastic errors, resulting in the y-coordinate spreading around the line y = 1. 

Figure 4 here. 

We have proved theoretically that nonparametric estimators based on time-domain 
smoothing and state-domain smoothing are asymptotically independent. To verify 
this, we compute their correlation coefficients. Since both estimators are matrices, for 
a given portfolio allocation vector a, we compute the correlation of the two estimators 
a^Xl^a and a^Xls^a across 500 simulations at each given time t in the out-sample. 
Figure 5 presents the correlation coefficients for a = (0.2, 0.2, 0.2, 0.2, 0.2) T . Most of 
the correlations are below 0.1, which strongly supports our theoretical result. We also 
include the 95% confidence intervals based on the Fisher transformation in the same 
graph (the two dashed curves). A large amount of these confidence intervals contain 
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0. The two straight lines in the plot indicate the acceptance region for testing the 
null hypothesis that the correlation coefficients are zero at the significance level 5%. 
Most of these null hypotheses are accepted or nearly accepted. In fact, we conducted 
experiments on the same simulations with larger sample sizes, and found that as the 
sample size increases, the absolute values of the correlation coefficients decrease to 0. 

Figure 5 here. 

5.2 Empirical Studies 

In this section, we apply the aggregated method to two sets of financial data. Our aim is 
to examine whether our approach still outperforms the time-domain and state-domain 
nonparametric estimators in real applications. 

5.2.1 Treasury Bonds 

We consider the weekly returns of five treasury bonds with maturities 3 months, 2 years, 
5 years, 7 years, 10 years, and 30 years. We set the last 150 observations, which run 
from April 9, 1999 to February 15, 2002, as the out-sample data. For each observation 
from the out-sample data, we use the past 104 observations (2 years) with A = 0.94 
to obtain the time-domain estimator, and the state-domain estimate is based on the 
past 900 data points. The prediction error (Measure 3) and adaptive prediction error 
(Measure 4) are used to assess the performance of the three estimators: the time-domain 
estimator, the state-domain estimator, and the aggregated estimator. The results are 
reported in Table 1. From the table, we see that the aggregated estimator outperforms 
significantly the other two estimators. 

For comparison, the results from the simulated data are also reported. Even through 
there is only a small improvement in PE for simulated data, as evidenced in Section 
4.1, there is a huge improvement in the precision of estimating 5]< in terms of entropy 
loss (measure 1) and quadratic loss (measure 2). Hence, with the improvement of the 
PE in the bond price by the aggregrated method, we would expect to have a huge 
improvement on the precision of the estimation of covariance, which is of primary 
interest in financial engineering. 
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5.2.2 Exchange Rate 

We analyze the weekly exchange rates of five foreign currencies with US dollars from 
September 6, 1985 to August 19, 2005. The five foreign currencies are the Canadian 
Dollar, Australian Dollar, Europe EurcF'l. UK British Pound, and Switzerland Franc. 
The length of the time series is 1042. The exchange rates from December 6, 2002 to 
August 19, 2005, which are of length 142, are regarded as out-sample data, and the 
estimation procedures are the same as before, i.e., for each out-sample observation, the 
last 104 data points with A = 0.94 are set to construct the time-domain estimator, the 
900 data points before the current time are used to construct state-domain estimator, 
and then roll over. The results, based on the PE and APE defined in Section 4, are 
also summarized in Table 1. They demonstrate clearly that the aggregated estimator 
outperforms the time-domain and state-domain estimators. 

Using again the simulated data for calibration, as argued at the end of Section 4.2.1, 
we would reasonably expect that the covariance matrix estimated by the aggregated 
method outperforms significantly both the matrices estimated by either the time- or 
state-domain method alone. 

Tabic 1 here. 



6 DISCUSSIONS 

We have proposed an aggregated method to combine the information from the time 
domain and state domain in multivariate volatility estimation. To overcome the curse 
of dimensionality, we proposed a "factor" modeling strategy. The performance compar- 
isons are studied both theoretically and empirically. We have shown that the proposed 
aggregated method is more efficient than the estimators based only on recent history 
or remote history. Our simulation and empirical studies have also revealed that proper 
use of information from both the time domain and the state domain makes volatility 

"Europe used several common currencies prior to the introduction of the Euro. The European 
Currency Unit (ECU) was used from January 1, 1979 to January 1, 1999, when the Euro replaced the 
European Currency Unit at par. 
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matrix estimate more accurate. Our method exploits the continuity in the time domain 
and stationarity in the state domain. It can also be applied to situations where these 
two conditions hold approximately. 

Our study has also revealed another potentially important application of our method. 
It allows us to test the stationarity of diffusion processes. When time-domain estimates 
differ substantially from those of the state domain, it is an indication that the processes 
is not stationary. Since the time-domain and state-domain nonparametric estimators 
are asymptotically independent and normal, formal tests can be formed. Further study 
on this topic is beyond the scope of this paper. 



APPENDIX: PROOFS 

A.l Proof of Proposition 1 

In all the proofs below, we use M to denote a generic constant. 

First, we show that the process {ft} is locally Holder continuous with order q = 
(2q - l)/(4g ) and coefficient K\ satisfying E[K^ qo+5) ] < oo, i.e. 

|/«-/u| <K!\s-u\ q , s,u E [t-7],t], (A.l) 

where r\ is a positive constant. Note that 

E\f u -f s \ 4 ^ <ME\ f U a(f v )dv\ A{qo+5) + ME\ f ]T b^QdWtf^ 

J s j 

= (I) + in)- (A.2) 
Then by Jensen's inequality and Assumption 2, we have 

(/) < M(u - s )<io+S)-i f U E\a{f v )\^ qo+5) dv < M(u - s) 4(9o+<5) . (A.3) 

J s 

On the other hand, applying martingale moment inequalities (see, e.g. Karatzas and 
Shreve (1991), Section 3.3.D, p. 163), Jensen's inequality, and Assumption 2 gives 

(//) <ME( rj2 b ](.fv)dv) 2{qo+5) < M(u - s) 2 ^)" 1 f U Y, E \ h 3Uv)\ A{qo+S) dv 

(A.4) 

<M(u- s) 2 ( qo+5 \ 
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Combining (|A.2j) . ()A.3[) and ()A.4p together leads to 

^|/ u -/,| 4(90+5) <M(«- S ) 2 ^+ 5 ). 
Thus by Theorem 2.1 of Revuz and Yor (1999, Page 26), we have 

E[{sup{\f s -f u \/\s-un) 4iqo+S) } <oc (A.5) 

for any a £ [0, ^ggy 1 ). Let a = ^zi and ^ = S up s?&u {|/ a - - u^}. 

Then £ , [K 1 4(<?0+,5) < oo], and inequality (|A~T|) holds. 
Second, by ([8]) we have 

IK/*) -*■(/«) II < fco|/«-/«|. 

This together with (lA.lh shows that 

lk(/ s ) - <r(/„)|| < fco^lls - u\ q = K\s - u\ q . 
Hence, E[K A ^+S)] < Mfi^ 4 '' ^'] < oo. ■ 

A. 2 Proof of Theorem 1 

Proof. At time s, for fixed k, £, and i, define z£f = {X k - Xjt)(X e s - Xf.). Applying 
Ito's formula to Z k f results in 

m 

dZ% =(X k - Xl)dXi + (Xi - X e u )dX k + <rkj(f.)<rti(fs)ds 

i=i 

= '{X k s - X*>(/ s ) + (xi - x£K(/ s )] ds 

ps ps 

+ [ / eln(f u )dueJ(T(f s ) + / eJ(j,(f u )duel(T(f s )]dW s 

+ [ f S el*(f u )dW u eJ*(f s ) + f S eJ*(f u )dW u el*(f s )]dW s 

m 

+ ^2 a kj(fs)crej{fs)ds. 
3=1 

Hence, Y k Yf can be decomposed as 



Y k Y e_ A -l 7 k,Z /j ' -I- h ' -t- ii ' 
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where 

k,£ a-1 r'^^uvk v k\.. rt\ , rvt 



ti + l 



af =A" J I [(X* - Xl)M) + (Xf - Xi)» k (f s )]ds 
U+i rs 



t, 



+ A- 1 / ^ j S [el l i(f u )dueJa(f s )+eJ^(f u )duela(f s )]d^ s , 



6 M = A -i f'^ 1 [\el*(f u )m u eJ*(f s )+eJ*(f u )dW u ela(f s )]dW s 

and 

„+. . . m 



/ 22<Tkj(fs)(7£j(fs)ds. 

hi =1 



Correspondingly, (|14p has the following decomposition 

i=i £<fc j=i ^<fc 

i=l ^<fc 

= A n , A + £ niA + F n , A . (A.6) 

Therefore, Slutsky's lemma, together with Lemmas [TH2] below, leads to the conclusions 
of Theorem [1] immediately. ■ 

Lemma 1 Under Assumption 1, as n — > oo, nA — ► 0, and n(l — A) — > r, we /lave 

£< A = 0(A), (A.7) 
w/iere A n>A = jtt^t Ya=i Y2t<k c kea k /_ v as defined in iA.6\) . 
Proof. First, note that 

E{aY? <2E{A-' T +1 [{X k s - X*) m (fs) + - X^ k {f s )]dsf (A.8) 
+ 2£(A~ 1 f U+1 j\elti(f u )dueJa{f s ) + 4^f u )du4a{f s ))dW s ) 2 

J tin J ha 



'tj 

e/ 1 (A) + 7 2 (A). 
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Applying Jensen's inequality and Holder's inequality (Propostion 1), we obtain 
Ji(A) <MA^ F +1 E\{X k s -Xl)M{f s ) + {Xi-Xl)» k {f s )]\s (A.9) 

<MA^£ !+1 {(E(X* -X?fE[Mfs)} 4 ) 1/2 + {E{Xi-XlfE[^{f s t) l/2 }ds. 

Since an application of Jensen's inequality, martingale moments inequalities and As- 
sumption 2 results in 

E(X e s -X[f<M(E[ r^(/«H 4 + E^[ f^Uu)dWlY) 

<M((s-U) 3 f S E[fi e (f u )] A du + f^M{s-ti) I S E[a i3 {f u )] A du) 

< M(s-ti) 2 , 

we see that (|A.9p can be bounded as 

7i(A) < MA. (A. 10) 

We now consider the second term /2(A) in (|A.8|) . By stochastic calculus and 
Jensen's inequality, we have 

L i-\-l 



7 2 (A) = 2 I ^ X>(a^ f \Hk{Uy t j{fs) + Mfu)a kj (f s )]d u y 'ds 

H j = l 

< MA' 1 / £/ E[» k (f u )a £j (f s ) + fi £ (f u )a kj (f s )] 2 duds 



= 0(A). 

This together with (|A. 10|) leads to E(a^' e ) 2 = 0(A). Therefore, by the Cauchy-Schwarz 
inequality and the assumption that lim n (l — A) exists, 

EA 2 nA < Mn ( AzA) 2 ± X 2 ^ J2 clEia^f = O(A), 

^ ' i=l £<k 

which concludes the proof. ■ 

Lemma 2 Under Assumptions 1 and 2, as n — ► 00, nA q — > and n(l — A) — ► r, we 
have 

VnB n! A — ► Z c , 

where B n & is defined in &A.6\) and the random variable Zc is defined in Theorem 1. 
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Proof. We will decompose B n ^ into two parts and prove that the first part is asymp- 
totically negligible and the second part has some asymptotic distribution. 
Note that can be decomposed as 

ft M = B M + C M (A.H) 



where 



B k,i = A _i J2 (a kj (f t0 )a ep (f t0 ) + a kp (f t0 )a ej (f t0 )) \w> - W&dW* 

and 

C k / = A- 1 T +1 Ae£(<r(/ U ) - a(f t0 ))dW u eJa(f s ) + ela(f u )dW u eJ (a(f s ) - v(f to ))]dW s , 

J i j J ti<i 

where is the unit vector with fcth entry 1 and all other entries 0. Correspondingly, 
B n ,A is decomposed as 

Bn A = ^ E om E + r^rE c ^E a*- 1 ^ - b + c. 

First, we show that y/nC is asymptotically negligible. To this end, note that by 
stochastic calculus and the triangular inequality, we have 

ft.- 1 1 m 



E{C k /f <A~ 2 [ ,+1 J2 E { /" e *V(/«) ~ cT(f t0 ))dW u a ej (f s )) 2 ds 

Jti j=l * 

+ A" 2 E^( [ 3 el*(f u )dW u (a ej (f s ) - o tj {f t0 ))) 2 ds 

Jti J = l 

=a~ 2 / E J i ( A ) ds+A ~ 2 / E J i ( A ) ds - 

Jti j=l * j— 1 

Applying Holder's inequality yields 

/J j) (A) < (E{ - cr(f t0 ))dW u ) A E(a ej (f s ))*y /2 , (A.12) 

J H 

and then by martingale moment inequalities and (|15p we obtain 

EU e%(<T(f u )-<T(f t0 ))dW u ) <0(1)EU EK'(/«)-^(/*o)) 2 ^ 



<0((nA + A) 49 A 



2^ 
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1/2 



Hence, we can bound (|A.12p as 

/^(A) < 0((nA) 29 A) . (A.13) 

Next we consider iff' (A). Similarly, by Holder's inequalities, martingale moments 
inequalities, and (|15p we have 

J^(A) < [E{ f\ T k a{f u )dW u ) 4 E(a, 3 {f s ) - a^ftj)^ 

< 0(1)(E[ / ^4(/ u )^] 2 (nA + A) 4 W 

■/tj j = l 

< O ((nA) 2q A) . 
This together with (|A.13p implies that 

E(C k /f = O ((nA) 2f >) . 

Hence, it follows that 

E(^C) 2 = 0((nA) 2q ), (A.14) 

which means that \fnC is asymptotically negligible. 

Next, we consider the term ^JnB. We first define the augmented filtration Tt- Let 
(Q,T,P) be the probability space in which the Brownian motion {Wj,0 <t< 00} is 
defined, and Xq is the initial value of model (|4]) and independent of Too- Define the 
left-continuous filtration Qt = a(Xo) V {T^,0 < t < 00} as well as the collection of 
null sets M = {N e n-,3G e Goo with N C G and P(G) = 0}. Then the augmented 
filtration is defined as T = cr{Gt U N), < t < 00; Too = cr([j t>0 Tt)- First note 
that by stochastic calculus we have E[B k,e \To] = and for i 7^ j, B 1 *' and Bj ,e are 
independent. Therefore, we only need to verify the conditions of the central limit 
theorem for the martingale difference array (see, e.g. Hall and Heyde (1980), Corollary 
3.1, P.58); namely, we need to check 

E K4^ A "' E <*«M*.)' - ^zt 2 ^ WO • W)V* 

(A.15) 



i=l £<fc 



and 

1 - A 
1 - A r 

8=1 £<fe 



1 \ 
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Expression (|A.15|) gives the asymptotic conditional variance of \fnB and (|A.16P implies 
the conditional Lindeberg condition. These two conditions lead to 

V^B Z c , (A.17) 

where the random variable Zc is defined as in Theorem [TJ 

We first prove (|A.15p . From stochastic calculus we know that E\j3 i ' | J^J = and 
for i / j, B i ' and Bj' are independent. Moreover, by (fT5l) we have 

Hi 



E[B^Bf^ ti ] =A" 2 ^H^(f t0 )H^(f t0 ) [ 1+1 E(Wi - W^ds 

3,9 Jti 

IzZ" 1 ;,;' ifn'l'fU)) + o g ((nA + A) 9 ), 



2 

J ,9 

where Hj' g (x) = akj(x)ai g (x) + ak g {x)a^{x). It follows that 

var(^ Qjfe Bf = c T P D (2£(/ to ) £(/ to ))P£c 

^c T P D (2S(/ t )»S(/ t ))Pi;c. 

Therefore, we get the following result for the conditional variance of the left hand side 
of ([ATS) : 

f /^(l-A) R M, r \ 2 n(l-A)(l + A n ) ^ lM 

^ V 1-A" ^ 1 V (1 + A)(l-A n ) 1 1 w 

i=i i<k 1 e<k 

e r — 1 

where r = linin^oo n(l — A). This verifies (|A.15|) . 

Then we show (|A.5p . Straightforward calculations yield 



£[(E c ^) 4 N = o(i)E c4 ^[(^ M ) 4 i^] 

, A 



Kk £<k 

rti+1 



£<k j,g Jti 

=o(i)£4£(i4U>)) 4 - 

l<k j,g 
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This together with Assumption 2 and Holder's inequality leads to 
n 1 , 

t=l £<fc £<fc i,g 

which proves (|A.5|) . (|A.17|) holds in consequence. Combining (|A.14|) and (|A.17p and 
applying Slutsky's lemma, we obtain the conclusion in lemma [2j ■ 

Lemma 3 Under Assumptions 1 and 2, as n — > oo and nA q — ► 0, the following result 
holds for C n) A defined in liA.6]) 

E \C nA - c T vechCZ(f t ))\ = O ((nA) 9 ) . (A.18) 

Proof. Note that 



E\C nA - ^ c klVkl,t\ = l _ Xn E\ E A *~ 1 E Cfc£ (^-i ~ Ufc ^* 



£<fc i=l ^<fc 

1 A n ^ 
- TTyl E A? " 1 Yl c ki E \v k /_i - v ke , t \. 

i=l £<k 

Thus we only need to consider the asymptotic property of E\v^' — Vke,t\- By the 
Cauchy-Schwarz inequality and Holder's inequality, we have 

m fU+i 

E\v k / - vuA < A- 1 E / {E\o-kj{ft){o-tj{ft) ~ aijU*)) I 
+ ^|(o"Jfci(/t) - °kj(fs))o-ej(f s )\}ds 

m nf-.-t 

2l 1/2 



< A" 1 f; / { [Ea 2 k] (f t )E{a i3 {f t ) - * tj (f,)) 

3=1 U 

+ [E^(/i)-^(/m(/ s )] i/2 } (is 

Therefore by (fT5|) and Assumption 2, 

S|«?''-«W,t| =0((nA + A)«) =0((nA)«). 
This proves (|ATT8]h ■ 

A. 3 Proof of Proposition 2 
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Lemma 4 Since ft is a stationary real ergodic process, we have 

L f (T,x) a „ 



T,b){x)T 



p(x), 



where p(x) is the time-invariant density function of the process ft at x. 



Proof. See Bandi and Phillips (2003) and Bosq (1998, Theorem 6.3, P150). 



Lemma 5 Suppose A — ► 0, iVA — > oo, and -r a/A log A^ 1 = o(l). Under Assumptions 
3-5, we have for i = 0, 1, 2, 3 
Y rtN-i 



W e (x) 



^ J (fs - xfK h (f s - x)ds + Nh e ~ l O a . s . (VAlogA-iJ . 



Proof. First, note that for any nonnegative integer I < 4, we have 
~ X E / if>- x ^ K( l± ^^ds 



k=0 



< 



i N-l ,. t .., j> 

(A, -.J*"* -1 



hA 



k=0 



h 



{fs-xfK 



i Tsl f s ~ X 



ds 



<h + h 



(A.19) 



with 



and 



k=0 



N-l 



A" 



ij^ks X 
~h 



/? 



|/ t . - xfds 



(A.20) 



(A.21) 



fc=0 ifc 

where ffc s and Tks are both values on the line segment connecting ft k to f s . Now define 



kn,A = max sup l/s-J^J. 

Then, by Levy's modulus of continuity of diffusions (see, e.g. Revuz and Yor (1998, 
Ch. V, Exercise 1.20)), 



F lim sup — = 

V A^0 a/A log A" 1 



a = 1, 



(A.22) 
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where a is a suitable constant. In turn, (|A.22|) implies that 



This together with the assumption that i \J A log A 1 = o(l) leads to 

o a . s .(l) as iVA — ► oo. 



/? 

In view of (|AT20|) and (jAT21~j) . we have 



and 



r ks -x = h(-^ h o a . s .(l)), 

uniformly over k = 0, ■ • • , N — 1. Hence, by Lemma H] and Revuz and Yor (1999), 
Exercise 1.15 and Corollary 1.6 of Chapter 6, we obtain that (IA.20|) can be bounded as 

a-i tf-i r t k+1 



KNA_h l ^ r k+1 \^i ( fs-x , ^n /.-z , 

h< — - r l^J \K(^ r + O a . s .(m\—f r + Oa.s.(l)\ds 

k=0 Jtk 

=NAh<-^ f \K'( y -^ + o a .,(l)) + o a .,(l) 



Nh i^NA I + 0m .(1)) | |u + 0a ...(l)f(p(ufc + + o a . s .(l))du. 



/l 

This together with (|9|) yields 



/i <AT^O a . s .(iVAlogA-i) 



Similarly, we can show that COB is also bounded by Nh e O a . s .(^A log A" 1 ) . This 
proves the stated results. ■ 

Proof of Proposition 2 

Since x K{x) is a positive function, by Exercise 1.15 and Corollary 1.6 of Chapter 6 
of Revuz and Yor (1999), and LemmaH] above we have for £ = 0, 1, 

1 f^ 1 f fs -X,2l„ ( fs-X, 



NA 



2/ - x,2i ,y - x. L r (t N -x,y) 

K ^NAEb](y) dV 

h J u 2e K(u)(p(uh + x)+o a . s .(l))du 

h(p{x)n 2 e + Oa.s.(l)), 
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where we have used \x± = f x 4 K(x)dx < 00. This together with Lemma [5] leads to 

1 1 ftN-l 

jjWuix) = m J Us ~ x? l K h {fs ~ x)ds + Oo . s .(l) (A.23) 



=hr(p{x)n2t + o a . s .(l)). 

Let s(dx) = exp| -^^(y) ^} J2^l x ) ^ e ^ e s P eec ^ measure °f /*■ By the Quotient 
theorem (Revuz and Yor (1999), Theorem 3.12, Chapter 10, p. 427), 



, / , \ 2£+l 

JaU:- 1 (V) K h (f s -x)ds _ f Kh (y-x)8(dy) ^ m 



fl"- 1 K h Us ~ x)ds fK h (y - x)s(dy) 



+ O a . s .(l) 



NA J to 

14) 

as NA — > 00. In turn, this implies that 

W 2£+1 (x)/h^ _ j-gT {^r) 2e+1 K h (fs - x)ds + NO a . s .(^f^) 
W0{X) i fir K h (f s - x)ds + NO a . s .(^f^) 

= ^+o a .,(l). 

Combining (|A.23|) and (|A.24|> . we obtain 

WWx) = iV/t 2 ^ 1 ^^)^! + Oo . a . (I))- 

This completes the proof. ■ 
A. 4 Proof of Theorem 2 

Let M(ft k ) = E[Y k Yl\f tk \. Then the matrix function M(y) can be expanded around 
a fixed point x as 

M(y) = A + A 1 (y-x) + A 2 (y - xf + A 3 (y - xf + ■ ■ ■ , 

where Ao, Ai,--- are all matrices. To prove the asymptotic property of the state- 
domain estimator, let us decompose it as 

N-l N-l 
± s , t (x) - M(x) = £ Mx) (MUt k ) ~ M(x)) + Y, w k (x)(Y k Y T k - M(ft k )) 

k=0 fc=0 

= b + t. (A.25) 
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First, we establish the asymptotic behavior of the bias term b. Applying Taylor's 
expansion and Proposition [2] results in 
N-l 

b=Y,Mx) (M(f tk )-M(x)) 

k=0 

N-l N-l 

= Wk{x)M(ft k - x) + ^ w k (x)A 2 (ft k - xf + o a . s .(h 3 ) 

k=0 k=0 
= h 2 fi 2 A2 + O a . s .(h 2 ). 

Since we have the following decomposition 

±s,t( x ) ~ = (S s , t (x) - M(x)) + (M(i) - S(x)) = [b + (M(x) - S(x))] + 1, 

and M{x) — = o p (A), the asymptotic bias of the state-domain estimator is 

b + {Mix) - S(s)) = i/iV2S"(x) + Oa.s.(^ 2 ) + o p (A). (A.26) 

Then, let us consider the variance term t. Since t is a matrix, we first vectorize it 
and then consider the asymptotic normality of its linear combination, i.e. we look at 
the statistic 

N-l 

t = a T vech( w k (x){Y k Yl - M(/ t J)), 

fc=0 

where a is a constant vector. By Proposition [21 

iV-l 

* = ^7 Xjj " ^^M 1 ^ 1 ! " M Ut k )){^ + o a ,Xm (A.27) 

= A N {l + o a . s .{l)}. 

Therefore, we only need to show the asymptotic normality of An- To this end, first let 
#N,k = K h (f tk - x) a T vech(Y k Yl - M(f tk )) . Then A N = ^ J2k=o #N,k- Straight- 
forward calculations give 

var(0* )fc ) = E (K h (f tk - x)a T vech {Y k Y T k - M(/ tfe ))) 2 (A.28) 
= E{K 2 h Ut k ~ x)E[(z T vech(Y k Y T k - M{f tk ))f\f tk ] } 
= 2E {K 2 h (f tk - x) (a T P D S(/ tfe ) ® £(/ tfc )P£a)} 
= 2h- 1 v p(x)& T P D V(x) ® £(x)jga(l + o(l)), 
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where the last step follows from Taylor's expansion. 

Note that Yt e only depends on the sample path of ft over time interval [t#, te+i]. 
Thus by conditioning on J-^, we obtain 

cov(0*,i,0 W i) = E[$ NA K h (f te - x)E( S L T vedi(Y e Yj - M(/ t< ))|^)] = 0, £ > 1. 

(A.29) 

Combining (fOgj) and (fQ9j) entails 

var^) = , a T P g E(x) ® E(x)P£a(l + o(l)). 

Nhp(x) 

Since a stationary Markov process satisfying the G2 condition of Rosenblatt (1970) 
is p-mixing, we can use "big-block and small-block" arguments similar to those used 
by Fan and Yao (2003, Theorem 2.22, p. 77) to prove the asymptotic normality of An- 
The lengthy details are omitted here. Thus, 

y/NhA N ^ M(0,2u oP (x)- 1 a T P D -E(x) ® E(s)P£a). 

This together with (IA.26I) and (|A.27|) implies the asymptotic normality of the state- 
domain estimator, i.e. 



V N ha 1 vech(E s ,t( x ) ~ ^( x ) ~ 7, h ^"( x )) Af(0, 2i/ p(x)- 1 a i A(x)a), 
where a is an arbitrary constant vector. This completes the proof. ■ 



A. 5 Proof of Theorem 3 

We only need to show the asymptotic normality of the linear combination 

VWh a T vech ^£ s ,t - S(x) - ^h 2 {x)\ + ^ c T vech (% 2 T t - E(x)) , 

where a T and c T are two constant vectors. This is equivalent to showing the joint 
asymptotic normality of \J Nha T vech(^Ss,t — E(a;) — |/i 2 /i2E"(x)) and y / ^ cT vech(E T i f ). 
From the proof of Theorem [21 we have 

a T vech(S s , t -E(x)-i/iV2E"(x)) = a T t+o p (l) = t+o p (l) = ^jv{l+o a . s .(l)}+o P (l), 

where t, t and .Ajv are all defined in the proof of Theorem 2. Therefore, we need only 
to consider about the asymptotic normality of V NJiAn and y / nc T vech(Sj. i ) . 
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We truncate An by denning 

j N-a N 
^ v ' fc=0 

where a at is an integer depending only on N and satisfying aw/N — ► and a at A — > oo. 
We are going to show that: 

(i) A N and ■ v /nc T vech(S Tt ) are asymptotically independent; 

(ii) An — A N is asymptotically negligible. 

We first prove (i). Since a stationary Markov process satisfying the G2 condition of 
Rosenblatt (1970) is p-mixing with exponentially decaying p-mixing coefficient pt(-), 
and the strong- mixing coefficient a(£) < p(£) for any integer £, it follows that 

2 2 

| J Bexp{^(A Ar +c T vech(5] rit ))}-£;exp{^(^l Ar )}£;exp{i^c :r vech(5] Tjf )}| < 32a(a N -n) 

for any £ 6 R. This proves (i). 

Now, we prove (ii). Prom the proof of Theorem [2] we know that 

var(tf7v ifc ) = 2h- 1 u p{x)si T P D T l (x) ® £(x)P£a(l + o(l)), 

and cov(t?Ar i i, $n,£+i) = 0, W > 1. Therefore, 

var(\/iVh[AAr - ^at]) = 4^T7^oa T PDS(x) &> £(cc)P£a(l + o(l)) 0. 

p{x)Jy 

This along with i£[i?jv,fc] = gives 

VNh[A N - A N ] 0, 

which completes the proof of (ii). Combining (i) and (ii) entails that V NJiAn and 
_ ^2 

y / nc T vech(Xly J are asymptotically independent. This together with Theorem [T] and 
the asymptotical normality of V NJiAn shown in the proof of Theorem [2] completes the 
proof of Theorem [3j ■ 
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FIGURE LEGENDS 



Figure 1. Illustration of time- and state-domain estimation, (a) The yields of 1- 
year, 5-year, and 10-year treasury bills from 1962 to 2005. The vertical bar indicates 
localization in time, and the horizontal bar represents localization in state of the 5-year 
treasury bill process, (b) Illustration of time-domain smoothing: 1-year yield differences 
are plotted against 10-year yield differences with the regression line superimposed, (c) 
Illustration of the state-domain smoothing: 1-year yield differences are plotted against 
10- year yield differences for those periods with the corresponding 5-year yields restricted 
to the interval 6.37% ± .2%, indicated by the horizontal bar in (a). 

Figure 2. Functions A(t) (solid curve) and B(t) (dashed curve) for the parameters 
given in the simulation. 

Figure 3. (a) The averages of the entropy losses over 500 simulations for the time- 
domain estimation (dotted curve), state-domain estimation (dashed curve), and aggre- 
gated method (solid curve), (b) The standard deviations of the entropy losses over 
500 simulations for time-domain estimation (dotted curve), state-domain estimation 
(dashed curve), and the aggregated method (solid curve), (c) and (d): The same as in 
(a) and (b) except using the quadratic loss. 

Figure 4. (a) Box plots of the entropy losses over 500 simulations for the time- 
domain estimator (left), the aggregated method (middle), and the state-domain esti- 
mator (right), (b) and (c): The same as in (a) except that the quadratic loss and PE 
are used, respectively, (d) The ratios of the averages of the quadratic losses over 150 
out-sample forecastings using the time-domain and state-domain estimators to those 
based on the aggregated estimator (x-axis) are plotted against the ratios of the PEs 
based on the time-domain and state-domain estimators to those based on the aggre- 
gated estimator (y-axis). 

Figure 5. Correlation of the time-domain estimator and state-domain estimator for 
the volatility of an equally weighted portfolio. The dashed curves are for the 95% 
confidence intervals. The straight lines are acceptance regions for testing the null 
hypothesis that the correlation is zero at significance level 5%. 
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FOOTNOTE 



Footnote 1. By "stationarity" we do not mean that the process is strongly stationary, 
but has some structural invariability over time. For example, the conditional moment 
functions do not vary over time. 

Footnote 2. Ledoit and Wolf (2003) introduce a shrinkage estimator by combining 
the sample covariance estimator with that derived from the CAPM. Their procedure in- 
tends to improve estimated covariance matrix by pulling the sample covariance towards 
the estimate based on the CAPM. Their basic assumption is that the return vectors 
are i.i.d. across time. This usually holds approximately when the data are localized in 
time. In this sense, their estimator can be regarded as a time-domain estimator. 

Footnote 3. We prove in Section 4 that ^s,t an d ^T,t are asymptotically independent, 
and thus they are close to be independent in finite sample. In the following, by "nearly 
independent" and "almost uncorrelated" , we mean the same. 

Footnote 4. In practice, one can take the yields process with median term of maturity 
as the driving factor, as this bond is highly correlated to both short-term and long-term 
bonds. 

Footnote 5. The kernel function is a probability density, and the bandwidth is its 
associated scale parameter. Both of them are used to localize the linear regression 
around the given point xq. The commonly used kernel functions are the Gaussian 
density and the Epanechnikov kernel K(x) = 0.75(1 — x 2 )+- 

Footnote 6. The stationarity condition of /( in Assumption 3 can be weakened to 
Harris recurrence. See Bandi and Moloche (2004) for asymptotic normality of local 
constant estimator under recurrence assumption. 

Footnote 7. The optimal choice of weight is proportional to the effective number 
of data points used for the state-domain and time-domain smoothing. It always out- 
performs the choice with u>t = 1 (state-domain estimator) or u>t = (time-domain 
estimator) . 

Footnote 8. The choice comes from the recommendation of the RiskMetrics of J. P. 
Morgan. The parameter A can also be chosen automatically by data by using the 
prediction error as in Fan, Jiang, Zhang and Zhou (2003). Since we compare the relative 
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performance between the time-domain estimator and the aggregated estimator, we opt 
for this simple choice. The results do not expect to change much when a data-driven 
technique is used. 

Footnote 9. Here we add normal noise to make the model more realistic. Our method 
performs even better without noise. Since the noise vectors are i.i.d. across time and the 
standard deviations are small, adding them to the original time series does not change 
the whole structure. Hence, our theory can carry through under contamination. 

Footnote 10. With A = 0.94, the last data point used in the time domain has an 
extra weight 0.94 104 ~ 0.0016, which is very small. Hence, we essentially include all 
the effective data points. 

Footnote 11. Europe used several common currencies prior to the introduction of the 
Euro. The European Currency Unit (ECU) was used from January 1, 1979 to January 
1, 1999, when the Euro replaced the European Currency Unit at par. 
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(a) Yields of Treasury Bonds From 1962 to 2005 
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Fi gur6 1" Illustration of time- and state-domain estimation, (a) The yields of 1-year, 5-year and 10- year 
treasury bills from 1962 to 2005. The vertical bar indicates localization in time, and the horizontal bar represents 
localization in the state of the 5-year treasury bill process, (b) Illustration of time-domain smoothing: 1- 
year yield differences are plotted against 10-year yield differences with the regression line superimposed, (c) 
Illustration of the state-domain smoothing: 1-ycar yield differences arc plotted against 10-year yield differences 
for those periods with the corresponding 5-year yields restricted to the interval 6.37% ± .2%, indicated by the 
horizontal bar in (a). 
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Figure 2: Functions A(t) (solid curve) and B(t) (dashed curve) for the parameters given in the simulation. 
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(c) (d) 

Figure 3: (a) The averages of the entropy losses over 500 simulations for the time-domain estimation (dotted 
curve), state-domain estimation (dashed curve) and aggregated method (solid curve), (b) The standard devi- 
ations of the entropy losses over 500 simulations for the time-domain estimation (dotted curve), state-domain 
estimation (dashed curve) and aggregated method (solid curve), (c) and (d): The same as in (a) and (b) except 
using the quadratic loss. 
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(c) (d) 

Figure 4: (a) Box plots of the entropy losses over 500 simulations for the time-domain estimator (left), 
the aggregated method (middle), and the state-domain estimator (right), (b) and (c): The same as in (a) 
except that the quadratic loss and PE are used, respectively, (d) The ratios of the averages of the quadratic 
losses over 150 out-sample forecastings using the time-domain and state-domain estimators to those based on 
the aggregated estimator (x-axis) are plotted against the ratios of the PEs based on the time-domain and 
state-domain estimators to those based on the aggregated estimator (y-axis). 
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Figure 5: Correlation of the time-domain estimator and state-domain estimator for the volatility of an equally 
weighted portfolio. The dashed curves are for the 95% confidence intervals. The straight lines are acceptance 
regions for testing the null hypothesis that the correlation is zero at significance level 5%. 
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